Call/return stack branch target predictor to multiple next sequential instruction addresses

ABSTRACT

A computer system includes a branch detection module and a branch predictor module. The branch detection module determines that a first program branch is a possible call branch having a next sequential instruction address (NSIA), and determines that a first routine branch is a possible return capable branch having the first routine instruction address that is a detected as being offset. The branch predictor module determines that a second program branch is a possible call branch having a next sequential instruction address (NSIA), and determines that a second routine branch is a predicted return branch having a predicted target instruction address based on the NSIA of the second program branch and the predicted offset.

BACKGROUND

The present disclosure relates to branch prediction and, in particular,to distance-based branch prediction and detection.

Branch prediction attempts to identify the location of branches in aninstruction stream that is being executed by a processor to improveperformance. Accuracy is important to avoid costly branch wrong restartpenalties. Branch prediction can predict both the direction and targetinstruction address of a branch. Alternatively, without branchprediction, the pipeline may have to wait for branch resolution beforeproceeding along the taken or not taken path.

One current solution for predicting the target instruction address is touse a branch target buffer (BTB). The BTB stores what the targetinstruction address was for a branch the last time the branch wasencountered. This approach works well for branches whose targetinstruction addresses are not a function of the path taken to arrive atthe branch. However, for branches whose target instruction addresses area function of the path taken to arrive at the branch, history-basedstructures may be used, such as a multi-target table (MTT) (sometimesreferred to as a changing target buffer (CTB)). Other target predictionsolutions exist to predict return-type branch targets to the nextsequential instruction after their respective call-type branch.

SUMMARY

According to examples of the present disclosure, techniques includingmethods, systems, and/or computer program products provide forcall/return stack branch target predictions to multiple sequentialinstruction addresses. According to a non-limiting embodiment, acomputer data processing system comprises a branch detection moduleconfigured to determine that a first program branch is a possible callbranch having a next sequential instruction address (NSIA). The branchdetection module further determines that a first routine branch is apossible return capable branch having a first routine instructionaddress that is detected as being offset within a defined range ofallowed offsets from the NSIA. The computer data processing systemfurther includes a branch predictor module configured to determine thata second program branch is a possible call branch having a NSIA, and todetermine that a second routine branch is a predicted return branchhaving a predicted target instruction address based on the NSIA of thesecond program branch and the predicted offset.

According to another non-limiting embodiment, a method is provided toperform a branch prediction in a computer data processing system. Themethod comprises determining a first program branch is a possible callbranch having a next sequential instruction address (NSIA), anddetermining that a first routine branch is a possible return capablebranch having a first routine instruction address that is detected asbeing offset within a defined range of allowed offsets from the NSIA.The method further comprises determining that a second program branch isa possible call branch having a next sequential instruction address(NSIA), and determining that a second routine branch is a predictedreturn branch having a predicted target instruction address based on theNSIA of the second program branch and the predicted offset.

According to another non-limiting embodiment, a computer program productis provided to control an electronic device to perform a branchprediction in a computer data processing system. The computer programproduct comprises a computer readable storage medium having programinstructions embodied therewith. The program instructions are executableby an electronic computer processor to control the electronic device toperform operations comprising determining a first program branch is apossible call branch having a next sequential instruction address(NSIA), and determining that a first routine branch is a possible returncapable branch having a first routine instruction address that isdetected as being offset within a defined range of allowed offsets fromthe NSIA. The method further comprises determining that a second programbranch is a possible call branch having a next sequential instructionaddress (NSIA), and determining that a second routine branch is apredicted return branch having a predicted target instruction addressbased on the NSIA of the second program branch and the predicted offset.

Additional features and advantages are realized through the techniquesof the present disclosure. Other aspects are described in detail hereinand are considered a part of the disclosure. For a better understandingof the present disclosure with the advantages and the features, refer tothe following description and to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter which is regarded as the invention is particularlypointed out and distinctly claimed in the claims at the conclusion ofthe specification. The foregoing and other features, and advantagesthereof, are apparent from the following detailed description taken inconjunction with the accompanying drawings in which:

FIG. 1 is a block diagram of a processing system implementing a branchprediction and detection system according to examples of the presentdisclosure;

FIG. 2 is a block diagram of a branch prediction and detection systemaccording to examples of the present disclosure;

FIG. 3 is a block diagram of a data processing system including a branchpredictor unit configured to perform call/return stack target branchpredictions to multiple next sequential instruction addresses accordingto a non-limiting embodiment;

FIG. 4 illustrates an environment capable of implementing a branchprediction and detection in a program according;

FIG. 5; illustrates an environment capable of implementing call/returnstack target branch predictions to multiple next sequential instructionaddresses according to a non-limiting embodiment;

FIG. 6 illustrates a flow diagram of a method for branch detection andprediction according to examples of the present disclosure; and

FIG. 7 illustrates a flow diagram of a method for branch detectionaccording to examples of the present disclosure.

DETAILED DESCRIPTION

Branch prediction is a computing process which aims to find branches inan executed instruction stream in an effort to avoid costly branch wrongrestart penalties. Branch prediction is typically performed by aprocessing unit (e.g., a branch predictor unit) that predicts both thedirection and target instruction addresses (IAs) of a branch. Onetechnique to predict the target IA is by using a branch target buffer(BTB), which stores what the target IA was for a branch the last time itwas encountered. This approach works well for branches having target IAthat are not a function of the path taken to reach the branch. Forbranches having a target address that are a function of the path takento reach the branch, history-based structures such as a multi-targettable (MTT), for example, can be employed.

Branch prediction techniques can also utilize a call-return stack (CRS)to predict the target of the branch that ends a routine. This isachieved using an instruction set that includes a call and return(call/return) instruction, which informs the branch predictor unitexactly when to “push and pop” target IAs from a stack structure.

Challenges can arise when attempting to implement call/return techniquesin system architectures that do not employ well-defined CRS call/returninstructions. For instance, implementing a call-return stack in hardwarethat is based on an instruction set architecture (ISA) that does nothave call and return instructions well defined is difficult due to thecost of accurately identifying call-like and return-like instructions.Even if an ISA is enriched with call-return type instructions, the fullbenefit of a call-return stack may not be realized.

When implementing call/return techniques, it is necessary to ensure thatthe compilation of all files having associated calls and returns are alllinked together and updated simultaneously to exploit the full benefitof the call/return stack. However, systems that do not employwell-defined CRS call/return instructions may be susceptible toscenarios where code is not routinely recompiled if an instruction setis enriched with new instructions, such as the case with call/returntype instructions. Furthermore, systems that do not employ well-definedCRS call/return instructions may recompile only select modules of code,which leads to cases where some code has either the new call-returninstruction but the module it links does not have the correspondingpaired instruction. Old code implemented in systems that do not employwell-defined CRS call/return instructions also may not be capable ofexploiting a new call/return stack prediction mechanism if it requiresthe use of the new instruction.

Furthermore, it has been discovered systems that do not employwell-defined CRS call/return instructions may sometimes include codethat is written in a manner where the return address is not the nextsequential instruction address (NSIA), i.e., the next addressimmediately following the address of the call branch function. Instead,the return address may be an offset byte from the NSIA. For instance,rather than returning to NSIA, the return address may be NSIA+2 bytes,or NSIA+4 bytes, or NSIA+6 bytes, etc. Consequently, conventionalcall/return stack branch target prediction systems can realize costlywrong branch penalties because they are incapable of performingprediction to multiple next sequential instruction addresses (i.e.,NSIA+n, where n is >0).

Embodiments of the present teachings described herein provide for targetinstruction address prediction when a branch whose target instructionaddress is returning from a function. The present techniques enableidentifying branches that exhibit call-like characteristics to determineto return IA's, and identify branches that exhibit return-likecharacteristics to apply accurate target IA prediction withoutarchitectural changes to the underlying code. Unlike conventionalcall/return stack branch target prediction systems, one or moreembodiments of the present invention provide a branch predictor unitconfigured to perform call/return stack target branch predictions tomultiple next sequential instruction addresses. Accordingly, the dataprocessing system employing the inventive branch predictor unit canavoid costly wrong branch penalties when the return address is not thenext sequential instruction addresses immediately following the callbranch function.

With reference now to FIG. 1, a block diagram of a processing system 20for implementing the techniques described herein is illustratedaccording to a non-limiting embodiment. In examples, processing system20 has one or more central processing units (processors) 21 a, 21 b, 21c, etc. (collectively or generically referred to as processor(s) 21and/or as processing device(s)). In aspects of the present disclosure,each processor 21 may include a reduced instruction set computer (RISC)microprocessor. Processors 21 are coupled to system memory (e.g., randomaccess memory (RAM) 24) and various other components via a system bus33. Read only memory (ROM) 22 is coupled to system bus 33 and mayinclude a basic input/output system (BIOS), which controls certain basicfunctions of processing system 20.

Further illustrated are an input/output (I/O) adapter 27 and acommunications adapter 26 coupled to system bus 33. I/O adapter 27 maybe a small computer system interface (SCSI) adapter that communicateswith a hard disk 23 and/or a tape storage drive 25 or any other similarcomponent. I/O adapter 27, hard disk 23, and tape storage device 25 arecollectively referred to herein as mass storage 34. Operating system 40for execution on processing system 20 may be stored in mass storage 34.A network adapter 26 interconnects system bus 33 with an outside network36 enabling processing system 20 to communicate with other such systems.

A display (e.g., a display monitor) 35 is connected to system bus 33 bydisplay adaptor 32, which may include a graphics adapter to improve theperformance of graphics intensive applications and a video controller.In one aspect of the present disclosure, adapters 26, 27, and/or 32 maybe connected to one or more I/O busses that are connected to system bus33 via an intermediate bus bridge (not shown). Suitable I/O buses forconnecting peripheral devices such as hard disk controllers, networkadapters, and graphics adapters typically include common protocols, suchas the Peripheral Component Interconnect (PCI). Additional input/outputdevices are shown as connected to system bus 33 via user interfaceadapter 28 and display adapter 32. A keyboard 29, mouse 30, and speaker31 may be interconnected to system bus 33 via user interface adapter 28,which may include, for example, a Super 110 chip integrating multipledevice adapters into a single integrated circuit.

In some aspects of the present disclosure, processing system 20 includesa graphics processing unit 37. Graphics processing unit 37 is aspecialized electronic circuit designed to manipulate and alter memoryto accelerate the creation of images in a frame buffer intended foroutput to a display. In general, graphics processing unit 37 is veryefficient at manipulating computer graphics and image processing, andhas a highly parallel structure that makes it more effective thangeneral-purpose CPUs for algorithms where processing of large blocks ofdata is done in parallel.

Thus, as configured herein, processing system 20 includes processingcapability in the form of processors 21, storage capability includingsystem memory (e.g., RAM 24), and mass storage 34, input means such askeyboard 29 and mouse 30, and output capability including speaker 31 anddisplay 35. In some aspects of the present disclosure, a portion ofsystem memory (e.g., RAM 24) and mass storage 34 collectively store anoperating system such as the AIX® operating system from IBM Corporationto coordinate the functions of the various components shown inprocessing system 20.

Referring to FIG. 2, a branch prediction and detection system 200included in the data processing system 20 is illustrated according to anon-limiting embodiment. The branch prediction and detection system 200includes a level 1 (L1) instruction cache 202 from which instructionfetch unit (IFU) 206 fetches instructions. IFU 206 may support amulti-cycle (e.g., three-cycle) branch scan loop to facilitate scanninga fetched instruction group for branch instructions predicted ‘taken’,computing targets of the predicted ‘taken’ branches, and determining ifa branch instruction is an unconditional branch or a ‘taken’ branch.Fetched instructions are also provided to branch prediction unit (BPU)204, which predicts whether a branch is ‘taken’ or ‘not taken’ and atarget of predicted ‘taken’ branches.

In one or more embodiments, BPU 204 includes a branch directionpredictor (not shown in FIG. 2) that implements a local branch historytable (LBHT) array, global branch history table (GBHT) array, and aglobal selection (GSEL) array. The LBHT, GBHT, and GSEL arrays (notshown) provide branch direction predictions for all instructions in afetch group (that may include up to eight instructions). The LBHT, GBHT,and GSEL arrays are shared by all threads. The LBHT array may bedirectly indexed by bits (e.g., ten bits) from an instruction fetchaddress provided by an instruction address (iaddr) such as aninstruction fetch address register (IFAR). The GBHT and GSEL arrays maybe indexed by the instruction fetch address hashed with a global historyvector (GHV).

The GHV is a 20-bit vector that tracks a “taken/not taken” history of aprevious number of branches (e.g., the last 20 branches). In at leastone non-limiting embodiment, “1” bit is shifted into the GHV for everytaken branch while a “0” bit is shifted into the GHV for everyinstruction group. For example, a traditional GHV for the last 20branches could be: “10101011110001110011”. In at least one embodiment,the hash of a given GHV value is used as an index into the GBHT.Accordingly, the GBHT can be used to indicate that based on when thelast time the binary bit represented by the hash was detected, whetherthe next branch was either taken or not taken.

The IFU 206 provides fetched instruction to instruction decode unit(IDU) 208 for decoding. IDU 208 provides decoded instructions toinstruction dispatch unit 210 for dispatch. Following execution ofdispatched instructions, instruction dispatch unit 210 provides theresults of the executed dispatched instructions to completion unit 212.Depending on the type of instruction, a dispatched instruction isprovided to branch issue queue 218, condition register (CR) issue queue216, or unified issue queue 214 for execution in an appropriateexecution unit. The branch issue queue 218 stores dispatched branchinstructions for branch execution unit 220. CR issue queue 216 storesdispatched CR instructions for CR execution unit 222. Unified issuedqueue 214 stores instructions for floating point execution unit(s) 228,fixed point execution unit(s) 226, load/store execution unit(s) 224,among other execution units.

FIG. 3 illustrates a block diagram of a branch prediction unit (BPU) 204configured to perform call/return stack target branch predictions tomultiple next sequential instruction addresses according to examples ofthe present disclosure. The various components, modules, engines, etc.described regarding FIG. 3 may be implemented as instructions stored ona computer-readable storage medium, as hardware modules, asspecial-purpose hardware (e.g., application specific hardware,application specific integrated circuits (ASICs), as embeddedcontrollers, hardwired circuitry, etc.), or as some combination orcombinations of these. In examples, the engine(s) described herein maybe a combination of hardware and programming. The programming may beprocessor executable instructions stored on a tangible memory, and thehardware may include a processing device for executing thoseinstructions. Thus, system memory can store program instructions thatwhen executed by processing system 20 implement the engines describedherein. Other engines may also be utilized to include other features andfunctionality described in other examples herein.

The BPU 204 includes a branch detection module 250 and a branchprediction module 252. The branch detection module 250 and/or the branchprediction module 252 can be constructed as an electronic hardwarecontroller that includes memory and a processor configured to executealgorithms and computer-readable program instructions stored in thememory. The branch detection module 250 performs a branch detectionprocess to determine call-like and return-like branch instructions, forexample, at completion time (i.e., past branch instruction executiontime). That is, the branch detection method occurs at the point where abranch is known to be actually taken or not (i.e., a non-speculativepoint) as well as its correct target IA.

In at least one non-limiting embodiment, for every completed branchtaken, first, a distance D between the branches' branch instructionaddress (IA) and its target IA are compared. The distance may be anumber of bytes, halfwords, etc. This may be done as an exact orimprecise compare. If the distance is greater than a threshold (Th), thenext sequential IA (NSIA) of the branch instruction is saved in acompletion_NSIA side register and marked as valid as a potential call tothe return point is known. Next, the target IA of each successive branchis compared with the saved and valid completion_NSIA side register. Someimplementations might include the distance D comparison at this step. Ifthe values match, a possible return type branch is indicated. Thecompletion_NSIA side register is marked as invalid, completing thecall-return pair, and the possible return is recorded in a branchprediction table. If the values do not match, a possible return typebranch is not indicated. In this example implementation, onlyreturn-like branches are marked in a branch prediction table. Call-likebranches can be marked too in other examples to further improve thedesign and implementation efficiency.

In another non-limiting embodiment, for every completed taken branch,the branch detection module 250 compares the difference between itsbranch instruction address (IA) and a target instruction address (IA,D). When the difference (i.e., IA-IA, D) is greater than a thresholdvalue (Th), the branch detection module 250 saves the next sequentialinstruction address (NSIA) in a completion NSIA side register (not shownin FIG. 3) and marks it valid.

The branch detection module 250 then compares the target IA with thesaved and valid completion NSIA side register. In one or moreembodiments, the branch detection module 250 compares the target IA toall NSIA offset values (e.g., NSIA+2 NSIA+4, NSIA+6, NSIA+n). When thetarget IA matches a given NSIA offset value, the branch detection module250 determines a match.

When a match exists, the branch detection module 250 determines that apossible return type branch has been found. Accordingly, the branchdetection module 250 invalidates the completion NSIA side register, andrecords the possible return function along with its offset (e.g., 2bits, 4 bits, etc.) in a branch prediction logic (BPL) table (not shownin FIG. 3).

For every predicted taken branch not marked a “return type” branch, thebranch prediction module 252 compares a difference between its branchinstruction address (IA) and a target instruction address (IA, D). Whenthe difference (i.e., IA-IA, D) is greater than a threshold value (Th),the branch prediction module 252 saves the next sequential instructionaddress (NSIA) in a prediction NSIA side register (not shown in FIG. 3)and marks it valid.

For every predicted taken branch marked as a “return type” branch, thebranch prediction module 252 analyzes the status of the prediction NSIAside register. When the prediction NSIA side register indicates valid,the branch prediction module 252 uses the value in the prediction NSIAside register plus the offset in the branch target buffer (BTB) as thetarget IA. When, however, the prediction NSIA side register indicatesinvalid, the branch prediction module 252 relies on existing targetprediction structures. Exemplary existing target prediction structurescould be a multi-target table (MTT) or a branch target buffer (BTB).

The BTB can include a multitude of metadata associated with a particularbranch. In this scenario, the target address is one piece of metadata,while the return offset is another piece of metadata. In at least onenon-limiting embodiment, identified returns and identified calls can bemarked by setting a bit in the BTB. A detected offset can also bewritten into the prediction structure either with the call branch or thereturn branch. Accordingly, the offset can be detected from the NSIAthat the return goes to, which then can be remembered in order tosubsequently predict the offset.

Turning now to FIG. 4 an example environment 400 implementing a branchdetection in a program is illustrated. The environment 400 implementstechniques as disclosed herein to detect and predict likely branch pairsthat exhibit call-return characteristics. Although illustrated as aone-entry deep queue, the present techniques apply to stacks with depthlarger than one. In a stack implementation, there could be multipleinstances of NSIA registers and valid bits.

In particular, FIG. 4 illustrates the program 402 and a function 404. Atarget instruction address (IA) of Branch X (i.e., “call” function 401)occurs at a particular IA (e.g., x102) which is a distance D bytes awayfrom Branch X's IA. If the distance D is greater than the distancethreshold T, a next sequential instruction address (NSIA) (e.g., x104)after the IA for Branch X is saved. The target IA of a branch is adistance ‘a’ bytes away from the taken branch's IA. In this example,both the distance ‘a’ is less than a distance threshold T, andtherefore, the completion_NSIA immediately following the call functionBranch X is not replaced. The target of the taken branch does not matchthe NSIA (i.e., x104) following Branch X. However, the target of BranchY (i.e., “return” function 410 with IA x104) does match the NSIAimmediately following Branch X. This indicates a likely call-returnpair.

Turning to FIG. 5, an environment 500 capable of implementingcall/return stack target branch predictions to multiple next sequentialinstruction addresses according to a non-limiting embodiment. Theenvironment 500 allows for both detecting a possible return capablebranch, along with predicting multiple sequential instructions addresses(i.e., an offset NSIA) of a return capable branch.

In terms of detecting a possible return capable branch, for example, aprogram 502 includes a Branch X 501 that completes at address x102.Accordingly, the branch detection module 250 compares the distance (D)between Branch X's IA (e.g., x102) and its target IA (e.g., x400). Thedistance (D) can be a byte differential between Branch X's IA (e.g.,x102) and its target IA (e.g., x400). In this example, the distancebetween x102 and x400 exceeds the distance threshold (Th). Therefore,the branch detection module 250 determines that Branch X 501 is apossible caller branch. Accordingly, the branch detection module 250saves the NSIA (e.g., x104) of Branch X 501 in the call-return stack(CRS) and the bit corresponding to the x104 is marked as valid.

The routine 502 is then executed and encounters another branch (Branch Y510) at address x406, which is determined as a possible return capablebranch. The branch detection module 250 then confirms that thecompletion NSIA is valid, and proceeds to check the target IA of thebranch at address x406 to determine whether it matches the NSIA (e.g.,x104) stored in the CRS. In this example, the target IA at address x406does not match the NSIA (e.g., x104), but does match an offset of theNSIA (e.g., x106). Accordingly, the branch detection module 250 updatesthe branch prediction logic (BPL) table(s) to indicate that Branch Y 510at routine address x406 is a possible return capable branch with anoffset NSIA (e.g., NSIA plus 2 not, i.e., x106).

In terms of predicting multiple sequential instructions addresses (i.e.,an offset NSIA) of a taken branch, the branch prediction module 252searches one or more BPL tables to detect a taken branch. In thisexample, the branch prediction module 252 identifies call Branch X 501at address x102, and predicts its target IA (e.g., x400 in routine Z504). The branch prediction module 252 then compares the distance (D)between Branch X's IA (e.g., x102) and its predicted target IA (e.g.,x400). In this example, the difference between x102 and x400 exceeds thedistance threshold (Th). Therefore, the NSIA (e.g., x104) of call BranchX 501 is saved in the call-return stack (CRS).

The routine 502 is then executed and encounters another taken branch(Branch Y 510) at address x406. In response to detecting Branch Y 510 ataddress x406, the branch prediction module 252 checks the BPL table(s)and determines that the taken branch at address x406 was previouslyidentified (i.e., by the branch detection module) as a possible returncapable branch. The branch prediction module 252 also determines in thisexample that the prediction NSIA register is valid. Accordingly, thebranch prediction module 252 determines that it can use its CRS, andproceeds to determine which offset value (if any) should be added to theNSIA value (e.g., x104) previously stored in the CRS. In this example,the metadata included with the called Branch X 501 would indicate fromthe previously described detection operation that the offset is 2 bytes.Accordingly, the branch prediction module 252 adds 2 bytes to theinitial NSIA value (x104) to predict that the target IA of the returnBranch Y 510 at routine address x406 is x106 (i.e., x104+2).

Additional details of the branch detection and prediction are disclosedbelow with reference to FIGS. 6 and 7, respectively. Additionalprocesses also may be included, and it should be understood that theprocesses depicted in FIGS. 6 and 7 represent illustrations, and thatother processes may be added or existing processes may be removed,modified, or rearranged without departing from the scope and spirit ofthe present disclosure.

FIG. 6 illustrates a flow diagram of a method for branch detectionaccording to examples of the present disclosure. The method begins atblock 600 by monitoring instructions executed by the system. At block602, the system detects that a branch was taken (e.g., Branch Z) and iscompleted. At decision block 604, it is determined whether (A) the takenbranch's (Branch Z) target IA (Z_tgIA) is equal to the value stored incompletion_NSIA register with no offset (i.e., comp_NSIA+0) and (B)whether the completion NSIA register is valid (i.e., whether a potentialreturn is found). When it is determined that the taken branch's (BranchZ) target IA (Z_tgIA) is equal to the value stored in completion_NSIAregister with no offset (i.e., comp_NSIA+0), and the completion NSIAregister is valid (i.e., whether a potential return is found), a returncapable branch is found at the register with no offset (i.e., offset=0)at block 606. Accordingly, at block 608 Branch Z is marked as returncapable, and the completion_NSIA register with no offset (i.e.,comp_NSIA+0) is set as invalid. Marking Branch Z as return capable mayinclude marking Branch Z's metadata as return capable, for example, in abranch prediction logic (BPL) table. The method can then return to block600 to continue monitoring additional instructions, or in other examplescan terminate.

When, however, at decision block 604 it is determined that Branch Z'starget IA is not equal to the value stored in the comp_NSIA+0 registerand/or it is determined that the comp_NSIA+0 register is invalid, themethod proceeds to compare the target IA against all remaining offsetcomp_NSIA registers (i.e., comp_NSIA+2, comp_NSIA+4, comp_NSIA+n. Forinstance, at decision block 610 a determination is made as to whether(A) the taken branch's (Branch Z) target IA (Z_tgIA) is equal to thevalue stored in completion_NSIA register with an offset (e.g.,comp_NSIA+2) and (B) whether the completion NSIA register is valid(i.e., whether a potential return is found). When it is determined thatthe taken branch's (Branch Z) target IA (Z_tgIA) is equal to the valuestored in completion_NSIA register with the offset (e.g., comp_NSIA+2),and the completion NSIA register is valid, a return capable branch isfound at the register with the offset (e.g., offset=2) at block 612. Thesystem, therefore, determines that a return capable branch is found atthe current analyzed offset (e.g., offset=2). In other words, the systemdetermines that the Branch Z's target is the NSIA plus two bytes (i.e.,NSIA+2). Accordingly, at block 614 Branch Z is marked as return capable,and the completion_NSIA register with the offset (e.g., comp_NSIA+2) isset as invalid. The method can then return to block 600 to continuemonitoring additional instructions, or in other examples can terminate.

When, however, at decision block 610 it is determined that Branch Z'starget IA is not equal to the value stored in the offset comp_NSIA+2register and/or it is determined that the comp_NSIA+2 register isinvalid, the method proceeds to compare the target IA against the nextoffset comp_NSIA register (e.g., comp_NSIA+4). For instance, at block616 a determination is made as to whether (A) the taken branch's (BranchZ) target IA (Z_tgIA) is equal to the value stored in the next offsetcompletion_NSIA register (e.g., comp_NSIA+4) and (B) whether thecompletion NSIA register is valid (i.e., whether a potential return isfound). When it is determined that the taken branch's (Branch Z) targetIA (Z_tgIA) is equal to the value stored in the next offsetcompletion_NSIA register (e.g., comp_NSIA+4), and the completion NSIAregister is valid, a return capable branch is found at the register withthe next offset (e.g., offset=4) at block 618. Accordingly, at block 620Branch Z is marked as return capable, and the completion_NSIA registerwith the next offset (e.g., comp_NSIA+4) is set as invalid. The methodcan then return to block 600 to continue monitoring additionalinstructions, or in other examples can terminate

Assuming in this example that comp_NSIA+4 is the last offset completionregister, when the Branch Z's target IA is determined not to equal thelast offset completion_NSIA register (e.g., comp_NSIA+4) at operation616, the target IA for Branch Z (Z_tgIA) is compared to a thresholddistance (X) at operation 622. When the target IA for Branch Z is notgreater than the threshold distance away from Branch Z's IA, the methodreturns to operation 600 and continues monitoring instructions.Otherwise, when the target IA for Branch Z is greater than the thresholddistance away from Branch Z's IA, a possible caller branch is identifiedat block 624. It should be appreciated that, in examples, the distance(X) between the target IA and Branch Z's IA is determined using anabsolute value function. Accordingly, at operation 626 comp_NSIA is setequal to Branch Z's IA (Z_IA) with an added instruction length code(ILC) (e.g., 2 bytes, 4 bytes, 6 bytes, 8 bytes, etc.), and thecomp_NSIA is set as valid. Utilizing the ILC allows for computing theactual NSIA. The method can then return to block 600 and continuemonitoring additional instructions, or in other examples can terminate.

Additional processes also may be included, and it should be understoodthat the processes depicted in FIGS. 6 and 7 represent illustrations,and that other processes may be added or existing processes may beremoved, modified, or rearranged without departing from the scope andspirit of the present disclosure.

FIG. 7 illustrates a method of performing branch prediction according toexamples of the present disclosure. The method begins by performing abranch prediction logic (BPL) search (e.g., searching one or more BPLtables) at block 700. At block 702, a taken branch (e.g., Branch Y) ispredicted. At decision block 704, the metadata of the taken branch(e.g., Branch Y) is analyzed to determine whether Branch Y is returncapable and whether a predicted_NSIA side register is valid. If so, thesystem determines that the current taken branch is return capable, andproceeds to determine the target IA (i.e., NSIa+0 bytes, NCIS+2 bytes,NSIA+4 bytes, etc.) that matches the current taken Branch.

For instance, at operation 706, the system determines whether takenBrach Y matches the NCIS with no offset (i.e., offset=0). When a matchis found (i.e., offset=0), the predicted_NSIA side register is set asthe target IA for Branch Y (Y_tgIA) at operation 708, and thecorresponding predicted NSIA side register with no offset (pred_NSIA+0)is set as invalid at block 710. Accordingly, the method can return tooperation 700 and search the BPL, or in other examples can terminate.

However, no match is found (i.e., offset does not equal zero) at block706, the method determines whether the return address is an offset ofthe NSIA, i.e., whether the return IA is offset from the NSIA by 2bytes, 4 bytes, 6 bytes . . . n bytes. For instance, the method proceedsto block 712 to determine whether the NSIA is offset by 2 bytes (i.e.,whether the offset=2). When the offset=2, the predicted_NSIA sideregister is set as the target IA for Branch Y (Y_tgIA) at block 714, andthe corresponding predicted NSIA side register with an offset(pred_NSIA+2) is set as invalid at block 710. Accordingly, the methodcan return to block 700 and search the BPL, or in other examples canterminate. Otherwise, the method proceeds to block 716.

At block 716, the method determines that the offset is the lastavailable stored offset because the BTB tables store only a fixed numberof NSIA offsets: +0, +2 bytes, +4 bytes, +6 bytes, +8 bytes . . . +nbytes. When block 716 is reached after the BTB indicates the branch is areturn-type branch and the pred_NSIA register is valid, and the methodhas moved past the second from the last NSIA offset byte, then it isdetermined that the offset is the last available stored offset. Atoperation 718, the predicted_NSIA side register is set as the target IAfor Branch Y (Y_tgIA) at block 718, and the corresponding predicted NSIAside register with the last stored offset (pred_NSIA+n) is set asinvalid at block 710. Accordingly, the method can return to block 700and search the BPL, or in other examples can terminate.

When, however, at decision block 704 the metadata of Branch Y does notindicate that it is return capable and/or that the predicted_NSIA sideregister is valid, the method proceeds to block 720. At block 720, adetermination is made as to whether the target IA for Branch Y (Y_tgIA)is greater than a threshold distance (X) away from Branch Y's IA. WhenY_tgIA is not greater than the threshold distance (X), the methodreturns to search the BPL at block 700.

However, when Y_tgIA is greater than the threshold distance (X), apossible caller branch is found at block 722. It should be appreciatedthat, in examples, the distance (X) between the target IA and Branch Y'sIA is determined using an absolute value function. Accordingly, atoperation 724 pred_NSIA is set equal to Branch Y's IA (Y_IA) with anadded instruction length code (ILC) (e.g., 2 bytes, 4 bytes, 6 bytes, 8bytes, etc.), and the pred_NSIA is set as valid. In one or moreembodiments, the return capable branch uses the predicted_NSIA sideregister's value if the predicted branch had previously taken a wrongtarget (i.e., in a previous prediction, the same branch's target IA wasincorrect). The method can then return to block 600 and continuemonitoring additional instructions, or in other examples can terminate.

In addition to typical metadata stored in each branch prediction entry,such as a branch direction history, etc., each branch stored also savesits instruction length code, indicating the length of the instruction(e.g., in bytes). This may not be necessary for some architectures withfixed instruction lengths or configurations where the branch isrepresented in the branch prediction entry by its ending instructionaddress—for example the last byte address of the instruction.

Additional processes also may be included, and it should be understoodthat the processes depicted in figures described herein. Other processesmay also be added or existing processes may be removed, modified, orrearranged without departing from the scope and spirit of the presentdisclosure.

The present techniques may be implemented as a system, a method, and/ora computer program product. The computer program product may include acomputer readable storage medium (or media) having computer readableprogram instructions thereon for causing a processor to carry outaspects of the present disclosure.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present disclosure may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some examples, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to aspects of thepresent disclosure. It will be understood that each block of theflowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousaspects of the present disclosure. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various examples of the present disclosure havebeen presented for purposes of illustration, but are not intended to beexhaustive or limited to the embodiments disclosed. Many modificationsand variations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the described techniques.The terminology used herein was chosen to best explain the principles ofthe present techniques, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the techniquesdisclosed herein.

What is claimed is:
 1. A computer data processing system comprising: abranch detection module configured to determine a first programincluding a first program branch that is a possible call branchpreceding a next sequential instruction address (NSIA) included in thefirst program, and to determine a first routine including a firstroutine branch and to determine the first routine branch is a possiblereturn capable branch, the first routine branch located at a firstroutine instruction address included in the first routine and having atarget instruction address included in the first program that isdetected as being offset within a defined range of allowed offsets fromthe NSIA; and a branch predictor module configured to determine thefirst program includes a second program branch that is a possible callbranch having a NSIA included in the first program, and to determine thefirst routine includes a second routine branch that is a predictedreturn branch having a predicted target instruction address included inthe first program based on the NSIA of the second program branch and theoffset.
 2. The computer data processing system of claim 1, wherein thebranch predictor module detects a second program branch that is apossible call branch, and determines the predicted target instructionaddress of the second routine branch based on the NSIA of the firstprogram branch and metadata included with the second program branch. 3.The computer data processing system of claim 2, wherein the predictedtarget instruction address of the second routine branch is an offsetwith respect to the NSIA.
 4. The computer data processing system ofclaim 3, wherein the offset includes at least one byte that is added tothe NSIA.
 5. The computer data processing system of claim 4, wherein thebranch detection module determines the NSIA as a next sequentialinstruction address immediately following a program instruction addressof the first program branch.
 6. The computer data processing system ofclaim 1, wherein determining the first program branch is a possible callbranch having a next sequential instruction address (NSIA) is based on acomparison of a distance between a first target instruction address ofthe first program branch and a first instruction address of the firstprogram branch against a distance threshold value, and wherein thebranch detection module determines the possible call branch in responseto the distance being greater than the distance threshold value.
 7. Thecomputer data processing system of claim 4, wherein the metadataindicates the at least one byte defining the offset.
 8. A method ofperforming a branch prediction in a computer data processing system, themethod comprising: determining, via a branch detection module, a firstprogram including a first program branch that is a possible call branchpreceding a next sequential instruction address (NSIA) included in thefirst program; determining, via the branch detection module, a firstroutine including a first routine branch, and determining the firstroutine branch is a possible return capable branch, the first routinebranch located at a first routine instruction address included in thefirst routine and having a target instruction address included in thefirst program that is detected as being offset within a defined range ofallowed offsets from the NSIA; and determining, via a branch predictormodule, the first program includes a second program branch that is apossible call branch having a NSIA included in the first program; anddetermining, via the branch predictor module, the first routine includesa second routine branch that is a predicted return branch having apredicted target instruction address included in the first program basedon the NSIA of the second program branch and the predicted offset. 9.The method of claim 8, further comprising determining the predictedtarget instruction address of the second routine branch based on theNSIA of the first program branch and metadata included with the secondprogram branch.
 10. The method of claim 9, further comprisingcalculating the predicted target instruction address of the secondroutine branch as an offset with respect to the NSIA.
 11. The method ofclaim 10, wherein the offset includes at least one byte that is added tothe NSIA.
 12. The method of claim 11, further comprising determining,via the branch detection module, the NSIA as a next sequentialinstruction address immediately following a program instruction addressof the first program branch.
 13. The method of claim 8, furthercomprising: comparing a distance between a first target instructionaddress of the first program branch and a first instruction address ofthe first program branch against a distance threshold value; anddetermining, via the branch detection module, the possible call branchhas a next sequential instruction address (NSIA) in response to thedistance being greater than the distance threshold value.
 14. The methodof claim 11, further comprising using the metadata to indicate the atleast one byte defining the offset.
 15. A computer program product tocontrol an electronic device to perform a branch prediction in acomputer data processing system, the computer program product comprisinga non-transitory computer readable storage medium having programinstructions embodied therewith, the program instructions executable byan electronic computer processor to control the electronic device toperform operations comprising: determining, via a branch detectionmodule, a first program including a first program branch that is apossible call branch preceding a next sequential instruction address(NSIA) included in the first program; determining, via the branchdetection module, a first routine including a first routine branch, anddetermining the first routine branch is a possible return capablebranch, the first routine branch located at a first routine instructionaddress included in the first routine and having a target instructionaddress included in the first program that is detected as being offsetwithin a defined range of allowed offsets from the NSIA; anddetermining, via a branch predictor module, that the first programincludes a second program branch that is a possible call branch having aNSIA included in the first program; and determining, via the branchpredictor module, that the first routine includes a second routinebranch that is a predicted return branch having a predicted targetinstruction address included in the first program based on the NSIA ofthe second program branch and the predicted offset.
 16. The computerprogram product of claim 15, wherein the operations further comprisedetermining the predicted target instruction address of the secondroutine branch based on the NSIA of the first program branch andmetadata included with the second program branch.
 17. The computerprogram product of claim 16, wherein the operations further comprisecalculating the predicted target instruction address of the secondroutine branch as an offset with respect to the NSIA.
 18. The computerprogram product of claim 17, wherein the offset includes at least onebyte that is added to the NSIA.
 19. The computer program product ofclaim 18, wherein the operations further comprise determining, via thebranch detection module, the NSIA as a next sequential instructionaddress immediately following a program instruction address of the firstprogram branch.
 20. The computer program product of claim 15, whereinthe operations further comprise: comparing a distance between a firsttarget instruction address of the first program branch and a firstinstruction address of the first program branch against a distancethreshold value; and determining, via the branch detection module, thepossible call branch in response to the distance being greater than thedistance threshold value.